Rapid Rule Compaction Strategies for Global Knowledge Discovery in a Supervised Learning Classifier System
نویسندگان
چکیده
Michigan-style learning classifier systems have availed themselves as a promising modeling and data mining strategy for bioinformaticists seeking to connect predictive variables with disease phenotypes. The resulting ‘model’ learned by these algorithms is comprised of an entire population of rules, some of which will inevitably be redundant or poor predictors. Rule compaction is a post-processing strategy for consolidating this rule population with the goal of improving interpretation and knowledge discovery. However, existing rule compaction strategies tend to reduce overall rule population performance along with population size, especially in the context of noisy problem domains such as bioinformatics. In the present study we introduce and evaluate two new rule compaction strategies (QRC, PDRC) and a simple rule filtering method (QRF), and compare them to three existing methodologies. These new strategies are tuned to fit with a global approach to knowledge discovery in which less emphasis is placed on minimizing rule population size (to facilitate manual rule inspection) and more is placed on preserving performance. This work identified the strengths and weaknesses of each approach, suggesting PDRC to be the most balanced approach trading a minimal loss in testing accuracy for significant gains or consistency in all other performance statistics.
منابع مشابه
An Extended Michigan-Style Learning Classifier System for Flexible Supervised Learning, Classification, and Data Mining
Advancements in learning classifier system (LCS) algorithms have highlighted their unique potential for tackling complex, noisy problems, as found in bioinformatics. Ongoing research in this domain must address the challenges of modeling complex patterns of association, systems biology (i.e. the integration of different data types to achieve a more holistic perspective), and ‘big data’ (i.e. sc...
متن کاملPareto Inspired Multi-objective Rule Fitness for Noise-Adaptive Rule-Based Machine Learning
Learning classifier systems (LCSs) are rule-based evolutionary algorithms uniquely suited to classification and data mining in complex, multi-factorial, and heterogeneous problems. The fitness of individual LCS rules is commonly based on accuracy, but this metric alone is not ideal for assessing global rule ‘value’ in noisy problem domains and thus impedes effective knowledge extraction. Multi-...
متن کاملA New Hybrid Architecture for the Discovery and Compaction of Knowledge from Breast Cancer Datasets
This paper reports on a two-fold contribution; first, the introduction of a new compaction algorithm for the rules generated by learning classifier systems that overcomes the disadvantages of previous algorithms in complexity, compacted solution size, accuracy and usability. The second is the new hybrid architecture that integrates learning classifier systems with Rete-based Inference Engines t...
متن کاملUsing Bayesian Classification for Aq-based Learning with Constructive Induction
To obtain potentially interesting patterns and relations from large, distributed, heterogeneous databases, it is essential to employ an intelligent and automated KDD (Knowledge Discovery in Databases) process. One of the most important methodologies is an integration of diverse learning strategies that cooperatively performs a variety of techniques and achieves high quality knowledge. AqBC is a...
متن کاملAqBC: A Multistrategy Approach for Constructive Induction
In order to obtain potentially interesting patterns and relations from large, distributed, heterogeneous databases, it is essential to employ an intelligent and automated KDD (Knowledge Discovery in Databases) process. One of the most important methodologies is an integration of diverse learning strategies that cooperatively performs a variety of techniques and achieves high quality knowledge. ...
متن کامل